Enriching Patent Search with External Keywords: a Feasibility Study

نویسندگان

  • Ivelina Nikolova
  • Irina P. Temnikova
  • Galia Angelova
چکیده

This article presents a feasibility study for retrieving Wikipedia articles matching patents’ topics. The long term motivation behind it is to facilitate patent search by enriching patent indexing with relevant keywords found in external (terminological) resources, with their monolingual synonyms and multilingual translations. The similarity between patents and Wikipedia articles is measured using various filtering techniques and patent document sections. The most similar Wikipedia articles happen to be the closest ones to the respective patent in 33% of the cases, otherwise they are within the top 12 ranked articles. 1 Motivations and Related Work Patent documents exhibit structure uniformity (Alberts et al., 2011) and have assigned classification codes but patents search is not a trivial task. This is due to the large number of patents available worldwide (forty millions) (Hunt et al., 2007) and the specific language genre. Usually the invention descriptions aim at covering the widest possible application area and are intentionally left very vague. Thus patents do not follow a preestablished terminology but rather are written according to the specific lexicon and style of each inventor (Alberts et al., 2011). Patent applications are published before the granting decision, therefore their titles and abstracts are intentionally left very general (Adams, 2010a). Moreover, the internationally used classification hierarchies vary among institutions and are periodically changed. The present NLP technologies provide insufficient support to patent searchers’ needs (Lupu et al., 2011; Adams, 2010a). Full-text search is the most preferred type of patent search while examining a patent application in order to establish its novelty, patentability, and infringement (Adams, 2010a). Search is done through iterative attempts, using synonyms in order to catch the alternative expressions each inventor may use to describe the same concept (Hunt et al., 2007). It is known that it can take up to 40 hours (in average 12) for a specialist to complete the search task for 15 queries in 100 documents, including a minimum of 5 minutes for a single query formulation (Joho et al., 2010). Another specific requirement is that patent searchers need the highest possible recall because a single relevant missed document can invalidate an otherwise sound patent (Lupu et al., 2011). Our original idea is to use Wikipedia as a free, multilingual and constantly updated terminology resource, in order to enrich patent indexing with monolingual term synonyms and their translations in multiple languages. This would allow increasing patent search recall, and it is the solution we propose to recognizing vague and inventorspecific term definitions. Wikipedia is constantly updated; besides the multiple critiques to the reliability of Wikipedia articles1, its peer-review nature repays for it (Giles, 2005). Thus the automatic recognition of relevant to the patent’s topic Wikipedia articles is a first experimental step towards enriching patents indexing with Wikipedia terms. As many Wikipedia article titles are homonyms (usually described in disambiguation pages2), full-text article recognition is necessary. Related Work in NLP for patents. Most of the NLP approaches contributing to patent search have been published in the CLEF-IP3, TRECCHEM4 tracks, the NTCIR workshops patents tracks for Japanese, and in the PaIR5 workshops. Lupu et al. (2011) provides a very good overview of the state-of-the-art of IR technologies for patents and how well they respond to the users’ needs. Multilinguality in patents search is http://en.wikipedia.org/wiki/Reliability of Wikipedia http://en.wikipedia.org/wiki/Wave %28disambiguation%29 http://www.ifs.tuwien.ac.at/ clef-ip/index.html. http://www.ir-facility.org/trec-chem. http://www.ir-facility.org/pair-workshops.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Events Retrieval Using Enhanced Semantic Web Knowledge

In this article, we present an experimental end user application to query DeRiVE 2011 challenge dataset in an innovative and intuitive manner. After enriching the dataset with external sources of information, it is indexed in a way that enables users to submit queries combining keywords, location and temporal anchor, in a single search field. The goal is to ease event retrieval providing a simp...

متن کامل

Automated Patent Categorization and Guided Patent Search using IPC as Inspired by MeSH and PubMed

Document search on PubMed, the pre-eminent database for biomedical literature, relies on the annotation of its documents with relevant terms from the Medical Subject Headings ontology (MeSH) for improving recall through query expansion. Patent documents are another important information source, though they are considerably less accessible. One option to expand patent search beyond pure keywords...

متن کامل

How to Use Patent Information to Search Potential Technology Partners in Open Innovation

With the increasing trend towards collaborations for innovation across organizational boundaries, the strategic gravity of exploring potential technology partners has been accentuated in the paradigm of open innovation. However, as the openness across nations or industries has become broad, the conventional approaches to searching external partners have encountered a number of difficulties. The...

متن کامل

Identification of BKCa channel openers by molecular field alignment and patent data-driven analysis

In this work, we present the first comprehensive molecular field analysis of patent structures on how the chemical structure of drugs impacts the biological binding. This task was formulated as searching for drug structures to reveal shared effects of substitutions across a common scaffold and the chemical features that may be responsible. We used the SureChEMBL patent database, which prov...

متن کامل

Extracting the significant-rare keywords for patent analysis

Brainstorming for keywords is used in retrieving patent documents, but even experienced engineers are irresolute in dealing with this critical issue. The quality of a patent report is usually already determined by the keywords they used in the first step. In order to improve the stumbling stone, this paper demonstrates a newmethod of how to find the significant-rare in a patent database. The re...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013